Bay of Bengal
Learning under Latent Group Sparsity via Diffusion on Networks
Ghosh, Subhroshekhar, Mukherjee, Soumendu Sundar
Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to sparse learning under such group structure, that does not require prior information on the group identities. Our paradigm is motivated by the Laplacian geometry of an underlying network with a related community structure, and proceeds by directly incorporating this into a penalty that is effectively computed via a heat-flow-based local network dynamics. The proposed penalty interpolates between the lasso and the group lasso penalties, the runtime of the heat-flow dynamics being the interpolating parameter. As such it can automatically default to lasso when the group structure reflected in the Laplacian is weak. In fact, we demonstrate a data-driven procedure to construct such a network based on the available data. Notably, we dispense with computationally intensive pre-processing involving clustering of variables, spectral or otherwise. Our technique is underpinned by rigorous theorems that guarantee its effective performance and provide bounds on its sample complexity. In particular, in a wide range of settings, it provably suffices to run the diffusion for time that is only logarithmic in the problem dimensions. We explore in detail the interfaces of our approach with key statistical physics models in network science, such as the Gaussian Free Field and the Stochastic Block Model. Our work raises the possibility of applying similar diffusion-based techniques to classical learning tasks, exploiting the interplay between geometric, dynamical and stochastic structures underlying the data.
Spatiotemporal deep learning models for detection of rapid intensification in cyclones
Sutar, Vamshika, Singh, Amandeep, Chandra, Rohitash
Cyclone rapid intensification is the rapid increase in cyclone wind intensity, exceeding a threshold of 30 knots, within 24 hours. Rapid intensification is considered an extreme event during a cyclone, and its occurrence is relatively rare, contributing to a class imbalance in the dataset. A diverse array of factors influences the likelihood of a cyclone undergoing rapid intensification, further complicating the task for conventional machine learning models. In this paper, we evaluate deep learning, ensemble learning and data augmentation frameworks to detect cyclone rapid intensification based on wind intensity and spatial coordinates. We note that conventional data augmentation methods cannot be utilised for generating spatiotemporal patterns replicating cyclones that undergo rapid intensification. Therefore, our framework employs deep learning models to generate spatial coordinates and wind intensity that replicate cyclones to address the class imbalance problem of rapid intensification. We also use a deep learning model for the classification module within the data augmentation framework to di fferentiate between rapid and non-rapid intensification events during a cyclone. Our results show that data augmentation improves the results for rapid intensification detection in cyclones, and spatial coordinates play a critical role as input features to the given models. This paves the way for research in synthetic data generation for spatiotemporal data with extreme events. Introduction Over the past decade, the impacts of climate change have manifested in an alarming increase in the strength of tropical cyclones, characterised by elevated levels of precipitation and wind intensity, resulting in devastating consequences on a global scale [1, 2, 3]. Rappaport et al. [4] defined rapid intensification as a sudden surge in wind intensity exceeding 30 knots (35 miles / hour or 55 kilometres / hour) within 24 hours [5]. Forecasting the rapid intensification of high-category cyclones (Category 4 and 5) poses greater challenges due to their infrequent occurrence, in contrast to lower-category cyclones[6].
Data Driven Deep Learning for Correcting Global Climate Model Projections of SST and DSL in the Bay of Bengal
Pasula, Abhishek, Subramani, Deepak N.
Climate change alters ocean conditions, notably temperature and sea level. In the Bay of Bengal, these changes influence monsoon precipitation and marine productivity, critical to the Indian economy. In Phase 6 of the Coupled Model Intercomparison Project (CMIP6), Global Climate Models (GCMs) use different shared socioeconomic pathways (SSPs) to obtain future climate projections. However, significant discrepancies are observed between these models and the reanalysis data in the Bay of Bengal for 2015-2024. Specifically, the root mean square error (RMSE) between the climate model output and the Ocean Reanalysis System (ORAS5) is 1.2C for the sea surface temperature (SST) and 1.1 m for the dynamic sea level (DSL). We introduce a new data-driven deep learning model to correct for this bias. The deep neural model for each variable is trained using pairs of climatology-removed monthly climate projections as input and the corresponding month's ORAS5 as output. This model is trained with historical data (1950 to 2014), validated with future projection data from 2015 to 2020, and tested with future projections from 2021 to 2023. Compared to the conventional EquiDistant Cumulative Distribution Function (EDCDF) statistical method for bias correction in climate models, our approach decreases RMSE by 0.15C for SST and 0.3 m for DSL. The trained model subsequently corrects the projections for 2024-2100. A detailed analysis of the monthly, seasonal, and decadal means and variability is performed to underscore the implications of the novel dynamics uncovered in our corrected projections.
Integrating Boosted learning with Differential Evolution (DE) Optimizer: A Prediction of Groundwater Quality Risk Assessment in Odisha
Subudhi, Sonalika, Pati, Alok Kumar, Bose, Sephali, Sahoo, Subhasmita, Pattanaik, Avipsa, Acharya, Biswa Mohan
Groundwater is eventually undermined by human exercises, such as fast industrialization, urbanization, over-extraction, and contamination from agrarian and urban sources. From among the different contaminants, the presence of heavy metals like cadmium (Cd), chromium (Cr), arsenic (As), and lead (Pb) proves to have serious dangers when present in huge concentrations in groundwater. Long-term usage of these poisonous components may lead to neurological disorders, kidney failure and different sorts of cancer. To address these issues, this study developed a machine learning-based predictive model to evaluate the Groundwater Quality Index (GWQI) and identify the main contaminants which are affecting the water quality. It has been achieved with the help of a hybrid machine learning model i.e. LCBoost Fusion . The model has undergone several processes like data preprocessing, hyperparameter tuning using Differential Evolution (DE) optimization, and evaluation through cross-validation. The LCBoost Fusion model outperforms individual models (CatBoost and LightGBM), by achieving low RMSE (0.6829), MSE (0.5102), MAE (0.3147) and a high R$^2$ score of 0.9809. Feature importance analysis highlights Potassium (K), Fluoride (F) and Total Hardness (TH) as the most influential indicators of groundwater contamination. This research successfully demonstrates the application of machine learning in assessing groundwater quality risks in Odisha. The proposed LCBoost Fusion model offers a reliable and efficient approach for real-time groundwater monitoring and risk mitigation. These findings will help the environmental organizations and the policy makers to map out targeted places for sustainable groundwater management. Future work will focus on using remote sensing data and developing an interactive decision-making system for groundwater quality assessment.
LASSE: Learning Active Sampling for Storm Tide Extremes in Non-Stationary Climate Regimes
Jiang, Grace, Qiu, Jiangchao, Ravela, Sai
Identifying tropical cyclones that generate destructive storm tides for risk assessment, such as from large downscaled storm catalogs for climate studies, is often intractable because it entails many expensive Monte Carlo hydrodynamic simulations. Here, we show that surrogate models are promising from accuracy, recall, and precision perspectives, and they "generalize" to novel climate scenarios. We then present an informative online learning approach to rapidly search for extreme storm tide-producing cyclones using only a few hydrodynamic simulations. Starting from a minimal subset of TCs with detailed storm tide hydrodynamic simulations, a surrogate model selects informative data to retrain online and iteratively improves its predictions of damaging TCs. Results on an extensive catalog of downscaled TCs indicate 100% precision in retrieving rare destructive storms using less than 20% of the simulations as training. The informative sampling approach is efficient, scalable to large storm catalogs, and generalizable to climate scenarios.
Self-attentive Transformer for Fast and Accurate Postprocessing of Temperature and Wind Speed Forecasts
Van Poecke, Aaron, Finn, Tobias Sebastian, Meng, Ruoke, Bergh, Joris Van den, Smet, Geert, Demaeyer, Jonathan, Termonia, Piet, Tabari, Hossein, Hellinckx, Peter
Current postprocessing techniques often require separate models for each lead time and disregard possible inter-ensemble relationships by either correcting each member separately or by employing distributional approaches. In this work, we tackle these shortcomings with an innovative, fast and accurate Transformer which postprocesses each ensemble member individually while allowing information exchange across variables, spatial dimensions and lead times by means of multi-headed self-attention. Weather foreacasts are postprocessed over 20 lead times simultaneously while including up to twelve meteorological predictors. We use the EUPPBench dataset for training which contains ensemble predictions from the European Center for Medium-range Weather Forecasts' integrated forecasting system alongside corresponding observations. The work presented here is the first to postprocess the ten and one hundred-meter wind speed forecasts within this benchmark dataset, while also correcting the two-meter temperature. Our approach significantly improves the original forecasts, as measured by the CRPS, with 17.5 % for two-meter temperature, nearly 5% for ten-meter wind speed and 5.3 % for one hundred-meter wind speed, outperforming a classical member-by-member approach employed as competitive benchmark. Furthermore, being up to 75 times faster, it fulfills the demand for rapid operational weather forecasts in various downstream applications, including renewable energy forecasting.
The Shipwreck Detective
The wreck was like a bug on the wall, a jumbly shape splayed on the abyssal plain. It was noticed by a team of autonomous-underwater-vehicle operators on board a subsea exploration vessel, working at an undisclosed location in the Atlantic Ocean, about a thousand miles from the nearest shore. The analysts belonged to a small private company that specializes in deep-sea search operations; I have been asked not to name it. They were looking for something else. In the past decade, the company has helped to transform the exploration of the seabed by deploying fleets of A.U.V.s--underwater drones--which cruise in formation, mapping large areas of the ocean floor with high-definition imagery.